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Abstract 

Background: Non-small cell lung carcinoma (NSCLC) is a complex malignancy that owing to its heterogeneity and 
poor prognosis poses many challenges to diagnosis, prognosis and patient treatment. DNA methylation is an 
important mechanism of epigenetic regulation involved in normal development and cancer. It is a very stable and 
specific modification and therefore in principle a very suitable marker for epigenetic phenotyping of tumors. Here 
we present a genome-wide DNA methylation analysis of NSCLC samples and paired lung tissues, where we 
combine MethylCap and next generation sequencing (MethylCap-seq) to provide comprehensive DNA methylation 
maps of the tumor and paired lung samples. The MethylCap-seq data were validated by bisulfite sequencing and 
methyl-specific polymerase chain reaction of selected regions. 

Results: Analysis of the MethylCap-seq data revealed a strong positive correlation between replicate experiments 
and between paired tumor/lung samples. We identified 57 differentially methylated regions (DMRs) present in all 
NSCLC tumors analyzed by MethylCap-seq. While hypomethylated DMRs did not correlate to any particular 
functional category of genes, the hypermethylated DMRs were strongly associated with genes encoding 
transcriptional regulators. Furthermore, subtelomeric regions and satellite repeats were hypomethylated in the 
NSCLC samples. We also identified DMRs that were specific to two of the major subtypes of NSCLC, 
adenocarcinomas and squamous cell carcinomas. 

Conclusions: Collectively, we provide a resource containing genome-wide DNA methylation maps of NSCLC and 
their paired lung tissues, and comprehensive lists of known and novel DMRs and associated genes in NSCLC. 
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Background 

Non-small cell lung carcinoma (NSCLC) is a common 
malignancy characterized by a worldwide high incidence 
and low survival rate [1]. NSCLC is a heterogenic disease 
which is broadly classified into three major histopatho- 
logical subtypes: adenocarcinoma (ADC), squamous cell 
carcinoma (SCC) and large cell carcinoma (LCC). This 
heterogeneity poses challenges for diagnosis and 
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treatment, since each subtype presents with a distinctive 
prognosis [2] and the choice of therapeutic regimen is 
predominantly based on tumor subtype and staging para- 
meters [3]. The development of personalized diagnostics 
and therapy is leading the way to a new era that may see 
us overcome some of the difficulties in treating complex 
diseases such as NSCLC. 

In the past decade, comparative gene expression profiles 
of tumors have been extensively studied [4-6], yielding 
useful insights into the molecular hallmarks of carcino- 
genesis [7,8]. With the advent of next generation sequen- 
cing, genome-wide screening has become an attractive 
tool for profiling tumors versus lung tissues [7,9]. DNA 
methylation is a very stable epigenetic mark and next gen- 
eration sequencing studies have recently shown that many 
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genes are aberrantly methylated in cancer [10,11]. Tissue 
specific DNA methylation patterns are stabilized during 
embryonic development, and faithfully maintained 
through cell divisions [12-14]. Nevertheless, established 
methylation patterns can be reprogrammed, with tumor 
cells undergoing DNA demethylation and de novo methy- 
lation through mechanisms not yet completely under- 
stood. CDKN2A and RASSF1 are examples of genes found 
to be aberrantly methylated in a wide variety of tumors 
[15-18], and epigenetic silencing of these genes has also 
been reported in NSCLC [19-22]. The great majority of 
DNA methylation studies are concentrated on the analysis 
of CpG islands located in the promoter areas of pre- 
selected genes. However, differentially methylated areas 
may be located within genes and at large distances from 
the nearest neighboring genes [23,24]. Although data on 
methylated genes in NSCLC are rapidly accumulating, un- 
biased data concerning specificity of the genome-wide dis- 
tribution of methylated loci are still scarce. 

In this study, we used Methyl-DNA Capture (Methyl- 
Cap) and high-throughput sequencing (MethylCap-seq, 
[25]) to perform a genome-wide DNA methylation 
screening of NSCLC tumors and paired adjacent lung 
tissues. With this approach, we sought to identify 
genome- wide aberrant methylation patterns of NSCLC. 
Specific differentially methylated regions would be 
promising candidate molecular markers for non-invasive 
diagnostics using circulating tumor DNA, and increase 
the number of possible targets for epigenetic therapy. 

Results 

Methylation profiles in NSCLC-study outline 

We performed genome-wide DNA methylation analysis of 
NSCLC using the MethylCap assay followed by high- 
throughput sequencing: MethylCap-seq (Figure 1A). We 
used DNA isolated from seven NSCLC tumors and paired 
lung tissues. Data regarding the samples used in this study 
can be found in Table 1 and Additional file 1. As controls, 
we prepared fully methylated and fully unmethylated gen- 
omic DNA. The DNA samples were sheared and then 
enriched for methylated DNA using the MethylCap pro- 
cedure [25]. This is based on the capture of methylated 
DNA by biotinylated methyl-binding domain protein 
(MBD), which is then retrieved by binding to 
strep tavidin-coated beads (Figure IB). The recovered 
DNA was directly sequenced using the Illumina Genome 
Analyzer IIx next generation sequencing platform. For 
each DNA sample, we performed two independent en- 
richment procedures and sequence runs. Total numbers 
of sequence reads, mapped reads and unique reads for 
each sample are represented in Additional file 2. For th 
identification of differentially methylated regions (DMRs) 
we employed a rigorous normalization procedure and a 



variety of bioinformatics tools (Figure 1C). We used 
MethylCap-seq data obtained from artificially prepared 
fully unmethylated and fully methylated DNA samples 
for normalization of the data and assignment of DMRs. 
The bioinformatics approach is described in detail in 
Additional file 3 and Additional file 4. Relative methyla- 
tion scores were used to build an individual methylation 
profile for each sample. Subsequently, we compared the 
profiles to identify highly significant DMRs between 
tumors and paired lung tissues, and between the sub- 
types of tumors. Finally, we validated the MethylCap-seq 
results by bisulfite sequencing and methyl-specific PCR 
of selected DMRs. 



Global analysis of genome-wide methylation patterns of 
NSCLC 

To assess the reproducibility of the MethylCap-seq pro- 
cedure, we first performed a self/self comparison of the 
replicate experiments. This yielded an average Pearson s 
correlation coefficient of 0.89 (Figure 2A and Additional 
file 5), indicating excellent reproducibility between inde- 
pendent experiments. Next, we compared methylation 
signal in tumors with matched healthy lung tissues; the 
average correlation coefficient between the two was 
0.83, indicating a generally high similarity in the methy- 
lation patterns of the matched tumor and lung samples 
(Figure 2B and Additional file 6). Collectively, these 
analyses establish the high reproducibility and specificity 
of the MethylCap-seq procedure that we used. To allow 
visual inspection of the methylation signal and DMRs, 
raw reads and log P-values of differential methylation 
were uploaded in the University of California, Santa Cruz 
(UCSC) Genome browser, where DMRs can be visually 
inspected (Figure 2C). To focus on the most significant 
DMRs, we used a very stringent P-value threshold of 
10" 18 , corresponding to a false discovery rate < 10" 15 
(Additional file 7). Non-repetitive regions with P- value 
<10" 18 were projected onto the genome and visua- 
lized as genome-wide maps of differential methylation 
(Figure 2D and Additional file 8). This revealed that in 
the non-repetitive fraction of the genome, tumors dis- 
played higher overall hypermethylation than the match- 
ing lung tissues. This is consistent with the notion that 
de novo methylation and hypermethylation of promoter 
CpG islands are associated with carcinogenesis [27] . 

The DMRs were analyzed according to general hall- 
marks of their genomic localization (Figure 2E). Com- 
pared with the general distribution of randomly sampled 
regions across the genome (label "Random regions"), we 
found that promoter areas were particularly enriched 
among the DMRs, both hypo- (P=5.7 x 10" 3 ) and hyper- 
methylated (p=53 x 10" 4 ). Although we removed most 
signals from repetitive regions by keeping only uniquely 
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Figure 1 Experimental design for profiling of DNA methylation patterns in non-small cell lung carcinoma. (A) Overall view of the steps 
followed to generate the profiles (ADC: adenocarcinoma; LCC: large cell carcinoma; MBD: methyl-binding domain protein; N: Lung; SCC: 
squamous cell carcinoma; T: tumor), (B) MethylCap using methyl-binding domain proteins: Sheared genomic DNA is used as input fraction 
(methyl groups in red) and incubated with beads [26], coated with streptavidin (green)-biotin (yellow)) and linked to methyl-binding domain 
protein (blue) to capture methylated DNA. Captured fragments are subjected to high-throughput sequencing. (C) Summary of the bioinformatics 
approach used to generate the methylation profiles. 



mapped reads, the signals from their uniquely mappable 
reads and flanking regions were still above background 
and allowed us to test for differential methylation. This 
revealed that some specific repeat classes such as LINE 
and satellites were also differentially represented in the 



DMRs, with satellite repeats being particularly hypo- 
methylated in tumors when compared with lung tissues 
(P=13 x 10" 5 ). By contrast, LINE repeats are relatively 
underrepresented in the DMRs (hypo and hypermethyla- 
tion), indicating that the methylation status of the bulk 



Table 1 Data from patients used for MethylCap-seq and bisulfite sequencing validation 



Sample ID 


Tissue type 


Patient ID 


Stage 


Overall survival (month) Survival status 


Gender 


Age at diagnosis 


Ethnicity 


2213 N 


healthy 


SCC1 


IIB 


2,43 


Alive 


M 


54,54 


Caucasian 


2214T 


SCC 
















2235 N 


healthy 


SCC2 


IIB 


51,23 


Deceased 


M 


73,73 


Caucasian 


2236 T 


SCC 
















2245 N 


healthy 


ADC1 


IB 


12,57 


Alive 


F 


66,26 


Caucasian 


2246 T 


ADC 
















2255 N 


healthy 


ADC2 


IA 


98,93 


Deceased 


F 


54,26 


Caucasian 


2256 T 


ADC 
















2257 N 


healthy 


ADC3 


N/A 


N/A 


N/A 


M 


78,96 


Caucasian 


2258 T 


ADC 
















2261 N 


healthy 


SCC3 


IIB 


6,77 


Alive 


M 


70,03 


Caucasian 


2262 T 


SCC 
















22he 


healthy 


LCC1 


IB 


35,73 


Deceased 


F 


56,58 


Caucasian 


22tu 


LCC 

















ADC: adenocarcinoma; LCC: large cell carcinoma; N/A: not applicable; SCC: Squamous cell carcinoma. 
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Figure 2 Global analysis of DNA methylation patterns in non-small cell carcinoma. (A) Correlation between experimental replicates. Each 
point represents the raw methylation signal (mean coverage in 10 bp bins). Density of points (log 10 scale) is shown in different shades of blue. 
Pearson's correlation coefficient is denoted in the top left corner of each scatter plot. (B) Correlation between tumor (x axis) and paired lung 
tissue sample (y axis). (C) Raw reads from a lung sample (above panel) and its matching tumor sample (middle panel) in the position 
chr2:1 76,716,200 to 176,738,910 viewed in the UCSC genome browser; normalized P-value is depicted in the bottom panel. Y-axis depicts the 
number of sequence reads in each region per sample. Dashed line represents the maximum number of reads (highest peak) in each sample; (D) 
Representation of the distribution of hypomethylated (green) and hypermethylated (red) regions across chromosomes in a tumor versus paired 
lung sample. (E) Composition of hypo- and hypermethylated regions in comparison to randomly sampled regions across the genome. Bars 
indicate the mean proportions of regions covered by each distinct genomic feature. Error bars denote standard error of the mean. Promoters 
were defined as regions ± 1 kb from all Ensembl transcription start site. Distinct classes of repeats were retrieved from the UCSC Table Browser 
(RepeatMasker table for hgl 8). Repeats were excluded from all subsequent features. Random regions were sampled across the genome 
independently seven times and each time their number was matched to the number of differentially methylated region in one sample pair. 
Statistical testing was done against random regions using one-tailed Student's t-test (*P-<0.01; **P-<0.001; ***p<0.0001). 
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of the LINE elements was similar between the tumors 
and paired lung counterparts. 

Differentially methylated regions in NSCLC 

We observed a total of 14,742 DMRs in the seven 
NSCLC tumor samples (Additional file 9), that is regions 
found to be differentially methylated in at least one 
tumor when compared with its lung counterpart with a 
two-fold change difference in methylation signal Per- 
forming unsupervised cluster analysis with these DMRs 
revealed that the samples clustered according to the 
histological subtype of the tumors (Figure 3). In particu- 
lar, hypermethylated DMRs were overrepresented in 
SCC samples. The dendogram of the DMRs indicated 
the presence of nine major subgroups of DMRs. Two of 
these were composed of generally hypermethylated (red) 
or hypomethylated (green) DMRs. These were plotted 
on the chromosomes, revealing that the 'red' DMRs 
were scattered throughout the chromosomes. By con- 
trast, the 'green' DMRs displayed a more localized distri- 
bution towards the chromosome ends (Figure 3B), 
showing on average 3.5~times greater enrichment than 
random regions (P=6.8 x 10 6 ). Subtelomeric hypomethy- 
lation has been found to be associated with risk of devel- 
oping cancer [28]. 

We found 57 DMRs present in all tumor samples, of 
which 50 were hypermethylated and 7 were hypomethy- 
lated. An example of two DMRs in all samples, including 
the replicate experiment can be found in Additional file 
10. We also analyzed DMRs in relation to NSCLC sub- 
types. We found 287 DMRs unique for SCC and 26 
DMRs unique for ADC (Additioonal File 11). These 
DMRs were classified by hyper- and hypomethylation 
and their genomic locations relative to genes (Figure 3B). 
We found that the ADC-specific DMRs had a genomic 
distribution similar to that observed for the DMRs 
present in all seven tumors, while for SCCs, DMRs within 
genes were concentrated in the gene body. SCC also pre- 
sented considerably more subgroup-specific DMRs than 
ADC and All-sample'; this could be explained by the no- 
tion that SCC is a relatively homogeneous tumor type. 

For the DMRs that fall in gene areas, we analyzed 
which types of genes were present in each of the three 
groups (Figure 3C and Additional file 11). We found that 
for the DMRs present in all tumor samples, more than 
75% of the associated genes belonged to the class of tran- 
scriptional regulators. For SCC-specific DMRs, we 
observed a considerable heterogeneity in the functional 
categories of associated genes. Nevertheless, there was an 
overrepresentation of genes involved in transcriptional 
regulation, organization of the cytoskeleton and cell cycle 
regulation. We note that while some of the DMRs were 
associated with genes that had been previously reported 
to be differentially methylated in lung tumors, such as 



APC, CDH13, CDKN2A, DAPK, hMLHl, HOX genes, 
OTX1, HOX2 and many others [22,29-32], we also found 
DMRs which, to the best of our knowledge, have not 
been previously reported to be associated with methyla- 
tion in NSCLC or other types of cancer. 

Validation of differentially methylated regions in NSCLC 
by bisulfite sequencing 

To validate the results obtained with MethylCap-seq, we 
selected five fragments for analysis by bisulfite sequen- 
cing. We chose CDKN2A and RASSF1 since hypermethy- 
lation of these genes has been widely reported in a variety 
of cancers including NSCLC [15,21,22], and the EMI pro- 
moter region since hypermethylation of EMI had been 
previously reported in a lung tumor cell line [29] but not 
yet in primary NSCLC tumors. Since we observed that 
the region upstream of EMI was highly hypermethylated 
in the tumors and involved a region of more than 15 kbp 
(Figure 4A), we selected two extra fragments within this 
region, in addition to the EMI promoter. The other two 
fragments were located at positions chr2:l 19,331,343 to 
119,331,692 and chr2:l 19,328,097 to 119,328,400 and we 
named them Frag_01 and Frag_02, respectively. Selection 
of the fragments for bisulfite sequencing analysis was 
based on the MethylCap-seq data (Figure 4). 

To assure sufficient depth of coverage for quantitative 
analysis, we sequenced 36 individual clones from each 
fragment and each bisulfite-converted DNA sample. 
Figure 4B shows an example of the methylation status 
for one fragment of a paired sample. The methylation 
status of the five fragments in all seven paired NSCLC/ 
lung tissue samples is summarized in Figure 4C; full 
data for all samples is available in Additional files 12 
and 13. We also examined potential differences in 
methylation according to tumor histology. To perform 
this analysis we grouped the samples in four categories: 
all clones from the lung tissues of patients (ADC_N and 
SCC_N) and clones from tumors (ADC_T and SCC_T), 
(Figure 4D and Additional file 14). We observed a sta- 
tistically significant difference (P< 0.0001), not only be- 
tween tumors and lung tissues, but also between ADC 
and SCC for all fragments analyzed by bisulfite sequen- 
cing. This identifies these fragments as candidate mar- 
kers for differentiating tumors versus paired lung tissues, 
and also between tumor subtypes. Moreover this finding 
validates the MethylCap-seq data, since the methylation 
status determined by bisulfite sequencing correlated 
quantitatively with the MethylCap-seq data (Figure 4E). 

Screening by methylation-specific PCR verifies 
methylation differences between tumor and paired lung 
tissues, and between tumor subtypes 

To screen the methylation status of selected DMRs in a 
larger sample set we used methylation-specific PCR 
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(See figure on previous page.) 

Figure 3 Differentially methylated region in non-small cell carcinoma versus paired lung tissue samples. (A) Heatmap of 14,742 most 
significant DMRs, picturing regions with mostly hypermethylated (top), mostly hypomethylated (middle), and mixes (bottom). Color bar at the 
bottom represents the log ratio of the normalized signal in tumor versus, lung (red= hypermethylation, green= hypomethylation). Dashed line 
represents cut in the dendogram, generating nine groups of DMRs. The DMRs from the two most distinctive groups are depicted on the 
chromosomes. Red: cluster containing regions hypermethylated in all samples; green: cluster containing regions hypomethylated in all samples. 
(B) Bar plot showing mean enrichment of DMRs at chromosome ends calculated as ratio between proportion of 1 Mbp region at chromosome 
ends and total proportion of each chromosome covered by DMRs. Error bars show standard error of mean of all chromosomes. Statistical testing 
was done against random regions using one-tailed Student's t-test (***P<0.0001). (C) DMRs distribution relative to gene position. Number of 
hyper- and hypomethylated genes and regions (outside gene area), and their distribution as in promoter, gene body and 3' UTR (top panel: all 
seven samples; middle panel: squamous cell carcinomas, bottom panel: adenocarcinoma). (D): Gene function distribution of the differentially 
methylated genes showing number of hyper- and hypomethylated genes relative to gene classes or function. DMR: differentially methylated 
regions; UTR: untranslated region. 
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(MSP) [33]. Here, we selected RASSF1, CDKN2A, 
Frag_01 and Frag_02 because they were the most signifi- 
cant differentially methylated fragments in the bisulphite 
sequencing experiments. We added ZIC4 because it was 
one of the DMRs not yet reported as methylated in 
NSCLC and presented a high difference in relative 
methylation score between lung and tumor samples. We 
screened 96 samples (48 NSCLC and paired lung sam- 
ples), and used totally methylated and totally unmethy- 
lated DNA samples as controls for the specificity of the 
MSP assays. Results for the seven paired samples used in 
MethylCap-seq are displayed in Figure 5. Data for all 
96 samples are summarized in Table 2 and shown in 
Additional file 15. 

MSP is a more qualitative approach than bisulfite se- 
quencing. In MSP, selective amplification depends on 
the methylation status of only the CpGs present in the 
designed primers, which were two or three CpGs de- 
pending on the fragment. For each methylated (M) pri- 
mer set, amplification of the fragment is dependent on 
methylation of the CpGs that are present in the se- 
quence targeted by the "M" primer; while each unmethy- 
lated (U) primer set requires unmethylated CpGs in the 
referred sequence for amplification. In the 48 paired 
samples analyzed, CDKN2A showed frequent amplifica- 
tion with the M primer set in lung samples (24 out of 
48), although the amplified bands were often of low in- 
tensity. Tumors had a much larger number displaying 
amplification with both M and U primer sets. Of the 47 
tumors, 36 displayed amplification with the M primer 
set, and two of the tumors displayed amplification only 
with the M primer set. For RASSF1 and ZIC4, 41 paired 
samples were available. Both fragments differentiated be- 
tween lung and tumor samples, whereas no methylation 
was detected in lung samples. ZIC4 showed methylation 
in more than 70% of the tumors, which is much higher 
than the 41% found for RASSF1. The findings for 
RASSF1 are in accordance with other studies [18]. Also, 
Frag_01 presented no amplification with the M primer 
set in the lung samples, in the Frag_02 M primer set 
only 5 out of 48 were amplified. In tumors, we observed 



fragment amplification in 29out of 48 samples for 
Frag_01 and 36out of 48 samples for Frag_02 with the M 
primer sets. Differences in methylation status were sig- 
nificant in tumors versus lung samples for all 5 frag- 
ments, with a ^-value<0.02 for CDKN2A and 
P < 0.0001 for the other four fragments (Figure 5B). 
Frag_01, Frag_02 and ZIC4 present a much lower P- 
value, indicating they may be better tumor markers than 
CDKN2A and RASSF1. We also observed that significant 
differences between SCC and ADC were only found for 
Frag_01 (Figure 5C). In conclusion, the MSP results val- 
idate the MethylCap-seq data, since the results obtained 
for all five fragments were consistent with the significant 
DMRs in all tumors, and Frag_01 appeared as part of the 
SCC-specific DMRs in the MethylCap-seq data. 

Discussion 

To date, genome-wide methylation studies in NSCLC 
have concentrated on specific areas of the genome [30], 
promoter regions [34-36] or sets of pre-selected candi- 
date genes [19,37]. Recently Kwon et al [38] published a 
report on genome-wide analysis of DNA methylation in 
NSCLC. However, they also concentrate the analysis on 
a selection of candidate genes. In our study, we per- 
formed an unbiased genome-wide DNA methylation 
analysis of seven individual patients with NSCLC and 
their adjacent lung tissues. We combined MethylCap 
with high-throughput sequencing (MethylCap-seq) to 
draw detailed methylome maps of NSCLC tumors and 
paired lung tissues. The replicate experiments showed 
that the depth of coverage generated by MethylCap-seq 
was sufficient to capture the methylation status of the 
entire genome. Moreover, there was a very high degree 
of correlation between the replicate experiments, dem- 
onstrating the robustness of the MethylCap-seq ap- 
proach. As controls for normalization and validation 
purposes we made use of fully methylated and fully 
unmethylated DNA samples. The MethylCap-seq data of 
these controls provide a useful resource for benchmark- 
ing of the fully methylated versus fully unmethylated 
status of any particular fragment in the genome. Using 
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Figure 4 Bisulfite sequencing validation for MethylCap-Seq. (A) Sequence reads for the Frag_01 (genomic region located at position Ch2: 
119,331,343 to 119,331,692) in a tumor (2214 T) and matching lung tissue (2213 N), plotted in the Genome Browser, showing the distribution of 
the 25 CpGs contained in the fragment highlighted in green. Dashed lines represent the highest number of reads in 2213 N and 2214 T. (B) 
Methyl ati on status of each CpG in all 36 individually sequenced clones in the same samples and fragment shown in Figure 4A. The middle row 
represents the average of methylation in all clones per CpG position. (C) Average of methylation of all clones sequenced per patient in each 
fragment (M: control totally methylated DNA; N:lung; T:tumor; U:control totally unmethylated DNA). (D) Average of the methylation status of the 
sum of all clones, grouped per histological subtype; comparison betweenadenocarcinoma and squamous cell carcinoma in all fragments were 
statistically significant. (E) Correlation between normalized methylation signals from MethylCap-seq and CpG methylation from bisulfite sequence. 
Different regions are shown in different colors; lung samples are marked by dots and tumors by stars. Pearson's correlation coefficient is denoted 
above the linear regression curve. *P<0.001. 



the MEDIPS computational tool [39], we generated 
methylation profiles of the seven paired NSCLC/lung 
cases which can be viewed in the genome browsers. 
This broad view of DNA methylation in NSCLC will 
provide new opportunities for the identification of 



specific epigenetic markers of NSCLC that could, for in- 
stance, be used for early detection of the disease. 

The occurrence of methylated CpGs in the fragments 
selected for bisulfite sequencing showed a high quantita- 
tive correlation with the reads found by MethylCap-seq 
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Figure 5 Methylation-specific polymerase chain reaction screening in 48 tumors and matching lung tissues. (A) Cropped gel images for 
the seven patients used for MethylCap-seq, grouped by histological subtype and controls (M: "methylated primer set"; N: lung; T: tumor; Tot_M: 
totally methylated control samples; Tot_U: totally unmethylated control sample; U: unmethylated primer set; o: blank/water). (B) Histogram of the 
methylation status in lung samples and tumor samples in all 48 patients in the five fragments analyzed. (C) Histogram of the methylation status 
per histological subtype (statistics were calculated with Fisher's exact test, two-tailed p-values: *P<0.02; **P<0.001 and ***P<0.01). 



in the same area. In conjunction with the MethylCap- 
seq data on the fully methylated and fully unmethylated 
controls, this demonstrates that MethylCap-seq yields 
reliable quantitative information on the methylation sta- 
tus of the fragments in a particular DNA sample. This is 
important considering that the tumor cell content of 
samples to be analyzed will be variable. Provided that 
sufficient sequencing depth is achieved, it should be pos- 
sible to use MethylCap-seq for the identification of 
tumor-specific hypermethylated DNA regions in samples 
containing only a minority of tumor cells, and possibly 
in circulating tumor DNA isolated from serum samples. 
The results obtained with MSP are qualitative in nature, 
but this PCR-based approach is very sensitive and may 
therefore be used to detect tumor-specific hypermethy- 
lated DNA fragments in serum DNA. As a first step in 
this direction, we used MSP to investigate the presence 
of five DMRs in a larger set of paired samples. The 
results confirmed the MethylCap-seq data and demon- 
strated that these DMRs are frequently hypermethylated 
in NSCLC. In particular, the Frag_01 fragment was 
found to be hypermethylated in 13 out of 16 SCC cases 
and in none of the ADC, suggesting that this might be a 
useful marker for this type of NSCLC, and ZIC4 showed 
100% specificity for tumors and higher sensitivity 



(>70%) than RASSF1. In addition, CDKN2A amplifica- 
tion with the M primer was often observed in lung tis- 
sue samples, although at much lower amplification levels 
when compared with tumor samples. This could reflect 
an early event in the methylation status of CDKN2A in 
tumorigenesis. 

The mapped reads point to differentially methylated 
areas across the genome, with some marked preferences 
for hypermethylation or hypomethylation in certain gen- 
omic regions. There is a strong correlation between CpG 
islands and hypermethylation in tumors, with the most 
significantly hypermethylated regions associated with 
promoter regions of genes. As previously reported by 
many groups [40-43], gene promoters are a target of 
methylation as an epigenetic regulatory mechanism. 
However, our data also reveal a high degree of hyper- 
methylation outside promoter areas at intergenic regions, 
and at long distances from genes. This indicates that pat- 
terns of DNA methylation may play a role not only by si- 
lencing promoters of key tumor suppressor genes, but 
also by regulating gene expression in a more complex 
manner through distant regulatory elements such as 
insulators and enhancers. 

We found that satellite regions were particularly hypo- 
methylated in the NSCLC tumors when compared with 



Table 2 Frequency of methylation by methylation-specific polymerase chain reaction in the 96 samples (48 tumor and matching lung tissue) in five fragments 



CDKN2A Frag_01 Frag_02 RASSF1 ZIC4 

count Sensitivity Specificity Count Sensitivity Specificity Count Sensitivity Specificity Count Sensitivity Specificity Count Sensitivity Specificity 



(n) 



(%) 



(%) 



(n) 



(%) 



(%) 



(n) 



(%) 



(%) 



(n) 



(%) 



(%) 



(n) 



(%) 



(%) 



Lung 24/48 75 

Tumor 36/48* 

Adenocarcinoma! 3/1 7 76.5 

Squamous 15/16 93.8 
cell carcinoma 



27 

46.4 
53.6 



0/47 62 
29/47** 

5/16 31.3 

13/16 81.3* 



100 

27.7 
72.2 



9/48 75 
36/48** 

12/17 70.6 

14/16 87.5 



56 

46.1 
53.9 



0/41 42 
17/41** 

8/15 53.3 

7/16 43.8 



100 

53.3 
46.7 



0/39 77 
30/39** 

7/13 53.9 

12/16 75.0 



100 

36.8 
63.2 



Sensitivity is calculated using the ratio between methylated (amplified) sample and the total amount of samples analyzed per fragment; specificity is the ratio of the difference between amplification in tumor and in 
the matching lung sample (or adenocarcinoma versus squamous cell carcinoma). Statistical significance of the difference between control lung samples and tumors, and between adenocarcinomas and squamous cell 
carcionmas were calculated using chi-square. *P<0.02; **P0.0001. 
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lung tissues. Recently, Ting et al [44] have shown that 
overexpression of satellite repeats is associated with dif- 
ferent types of cancer. The observed hypomethylation of 
these regions in NSCLC would be compatible with their 
results. We also found that hypomethylated regions are 
often present in subtelomeric regions, in agreement with 
the observation that hypomethylation is more frequently 
located near chromosome ends in lung cancer [45]. 
While CpG islands and genes are present in the subtelo- 
meric regions of chromosomes, the impact of subtelo- 
meric hypomethylation on the regulation of these genes 
is currently unclear. 

In an alternative approach, we selected the most sig- 
nificant DMRs in the individual NSCLC cases (14,742) 
and performed unsupervised cluster analysis of the 
hyper- and hypomethylated regions in the tumor sam- 
ples. We observed that the samples clustered according 
to their histological classification. This indicates that 
there are DMRs that could not only be used as NSCLC 
markers but also as markers for histological classification 
of NSCLC tumors. Subsequently we searched for the 
specific areas that were differentially methylated in all 
tumors when compared with the paired lung tissues, and 
specific for the subtypes. We observed that SCCs dis- 
played more DMRs than did ADCs did. This observation 
could be explained by the heterogeneity of ADCs, which 
would result in fewer DMRs shared by all three ADC 
samples. The contrary is seen in SCC, which is a more 
homogeneous type of NSCLC. This characteristic has 
been observed previously in gene expression profiling 
studies [4,46]. 

Remarkably, the majority (>75%) of the hypermethy- 
lated genes present in all seven samples encoded for 
transcription factors, while the shared hypomethylated 
genes were spread among many different functional cat- 
egories. Transcription factors play a central role in main- 
taining or modifying cell fate, either in normal processes 
such as development and cell differentiation, or in can- 
cer. Mechanisms of epigenetic regulation via DNA 
methylation are not yet completely understood; several 
factors such as DNA methyltransferases, chromatin re- 
modelling proteins and DNA-binding transcription fac- 
tors are involved [47,48]. Therefore, any perturbation 
that leads to a decrease in the expression status of those 
factors may disrupt important pathways for maintenance 
of the differentiated cellular state. Aberrant methylation 
can change chromatin structure, rendering DNA binding 
sites accessible or inaccessible to transcription factors 
leading to activation or silencing of genes important in 
cell differentiation and maintenance. It has been recently 
shown that hypermethylated genes found in lung tumors 
are associated with cellular morphogenetic differenti- 
ation [49]. These cellular mechanisms are orchestrated 
by transcriptional regulators; therefore changes in the 



methylation status of their binding sites may disrupt 
those processes thus contributing to oncogenesis. 

Conclusion 

Based on the MethylCap-seq data, we generated a list of 
candidate DMRs and associated genes specific for 
NSCLC and its subtypes (Table 3). Several of these 
DMR-associated genes are known to be hypermethylated 
in NSCLC and other types of cancer, strongly supporting 
the validity of the data set reported here. For instance, 
the observed methylation rates of RASSF1 were in agree- 
ment with previously reported results [18]. In the 
MetylCap-seq data, hypermethylation of the genomic re- 
gion where Frag_01 is located was most significant for 
the SCC samples. This was supported by the bisulfite 
sequencing and MSP validation experiments, where this 
fragment showed significantly more hypermethylation in 
SCCs when compared with ADCs or LCCs. In conclu- 
sion, this list of candidate DMR markers can be used to 
develop sensitive biological markers for NSCLC, which 
may enable non-invasive diagnosis and early detection 
of the disease, and potentially allow histological classifi- 
cation. Collectively, we provide a resource containing 
genome-wide DNA methylation maps of NSCLC and 
paired lung tissues, and comprehensive lists of DMRs 
and associated genes in NSCLC. 

Methods 

Patient samples 

Samples were obtained from the patients with NSCLC 
(n = 48) who had undergone surgical lung resection at 
Erasmus University Medical Center Rotterdam. Speci- 
mens were collected from the tumor and adjacent non- 
cancerous lung tissue and studied under an anonymous 
tissue protocol approved by the medical ethical commit- 
tee of Erasmus University Medical Center Rotterdam. 
Tissues were snap-frozen within two hours after surgical 
resection in liquid nitrogen pre-cooled isopentane, and 
stored at -196 °C or -80 °C until DNA extraction. 

Histopathological analysis 

Patient samples were independently reviewed by two 
pathologists. The cohort included 17 with ADC; 16 with 
SCC, seven with LCC , eight unclassified samples, and 
paired lung tissues for each tumor sample. 

Cell line 

The MRC-5 lung fibroblast-like cell line was used as 
control. Cells were cultured under standard conditions 
using minimum essential medium supplemented with 
10% heat-inactivated fetal bovine serum, 2 mM L- 
glutamine, 1% non-essential amino acids and penicillin/ 
streptomycin. Cells were harvested when they reached 
90% confluence. 
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Table 3 List of most differentially methylated genes in non-small cell carcinoma 



Gene 


Log 2 ratio 


Gene class/type/ 
biological function 


Gene name 


HGNC ID 


L.UnnilUN III all LUIIlUIS Dr\nA 1 


A 9. 


LldMSLI ipLIUM IcyUldlOl 


RARY hnmanhnv 1 
DAnA I IUII IcUUUX I 


nLDlNL-.^J J 


PA VO 


4.3 


transcription regulator 


paired box 9 


nuNL.oDZj 


Ul Al 


a n 
4.U 


transcription regulator 


orthodenticle homeobox 1 


urMr^Q^ j 1 
nLDl\IL..OJZ I 


IVrn j 


J.O 


G-protein 
coupled receptor 


natriuretic peptide receptor C/guanylate cyclase 
C (atrionatriuretic peptide receptor C) 


nUI\IL-./y4j 


FCF1 1 
rur \ Z 


J./ 


yiUVVLII IdLLUI 


IIUIUUIdbL yiUVVLII IdLLUI IZ 


noiNL-.JOOo 


UIVlL-U 1 Z 


J.O 


transcription regulator 


one cut homeobox 2 


nUINL-.O I Dy 


rnUlvl 1 H 


J.J 


transcription regulator 


PR domain containing 14 


MLdInL-. I ^RJU I 


pa y 


J.J 


LldNbLI ipilUN IcLjUldLUI 


IcLllld dllLl dlllcflUI llcUldl IUIL1 MUlllcUUUX 


nulNL. I OOOZ 


DnUAz 


J.J 


transcription regulator 


short stature homeobox 2 


i-ir~Mr~-i hr^/i 

nUINL.. I UOj4 


Ulvln Inz 


3 1 
J. I 


transcription regulator 


RK/1RT lil/o famik/ A") 

uivin i -iiKe laiiiiiy r\z 


i-ir;Mr-i 3qhr 

MLdInL.. I J^Uo 


CCD 1 1 A 

rtn 1 LH 


3 1 
J. I 


unknown 


fer-1-like 4 {Cdcnofhobditis. 
elegans) pseudogene 


i-ir;Mr-i ^rhi 

MLdInL.. I JOU I 


SIX6 


3.1 


transcription regulator 


SIX homeobox 6 


HGNC10892 


GATA3 


3 


transcription regulator 


GATA binding protein 3 


HGNC4172 


SK0R1 


3 


transcription regulator 


SKI familytranscriptional corepressor 1 


HGNC21326 


H0XA9 


3 


transcription regulator 


homeobox A9 


HGNC5109 


SALL1 


2.9 


transcription regulator 


sal-like 1 (Drosophila) 


HGNC:10524 


IRX2 


2.7 


transcription regulator 


iroquois homeobox 2 


HGNC14359 


GRIK2 


2.7 


ion channel 


glutamate receptor, ionotropic, kainate 2 


HGNC4580 


JAW DZ 


Z.O 


Lldl ISLI ipilOl 1 IcyUldLUI 


JM I D 1 1LH 1 IcUUUX Z 


r\\3\ NL..Z I Dj / 


AAFKl 
/WtD / 


j 

Z.J 


transcription regulator 


Meis homeobox 1 


nUI\IL../UUU 


\/A Y1 
VAA I 


j /i 
z.4 


transcription regulator 


ventral anterior homeobox 1 


nLDlNL.. 1 ZOOU 


TBX15 


2.3 


transcription regulator 


T-box 15 


HGNC:11594 


CNTD2 


-2.4 


unknown 


cyclin N-terminal domain containing 2 


HGNQ25805 


ZMYND10 


-2.6 


unknown 


zinc finger, MYND-type containing 10 


HGNC19412 


MYC 


-2.9 


transcription regulator 


v-myc myelocytomatosis viral 
oncogene homolog (avian) 


HGNC:7553 


TQDA MO 


J Q 

— z.y 


Plasma Membrane 


tetraspanin 9 


urKiroi A/in 
nLDlNL.Z 1 04U 


MA\/1 

/VAW / 


1 1 
—j. i 


unknown 


neuron navigator 1 


l"-ir"Mr"-i c:qqq 


rpcDo 
LtldJ 


3 A 
— J/+ 


unknown 


cytoplasmic polyadenylation 
element binding protein 3 


rlLDl\lL-.Z 1 /HO 


Unique for squamous TRAPPC9 
cell carcinomas 

AdHUz 


6.5 
6.3 


cell differentiation 
hydrolase 


trafficking protein particle complex 9 
abhydrolase domain containing 2 


HGNC:30832 
HGNC18717 


CTNND1 


6.2 


transcrption regulator 


catenin (cadherin-associated protein), delta 1 


HGNC2515 


HIST1H2BB 


6.2 


histone protein 


histone cluster 1, H2bb 


HGNC:4751 


EM PI 


5.8 


protein binding 


epithelial membrane protein 1 


HGNC3333 


TBL1XR1 


5.6 


transcription regulator 


transducin (beta)-like 1 X-linked receptor 1 


HGNC:29529 


NXPH1 


5.6 


protein binding 


neurexophilin 1 


HGNC:20693 


ZIC4 


5.6 


transcription regulator 


Zic family member 4 


HGNQ20393 


AOAH 


5.4 


hydrolase 


acyloxyacyl hydrolase (neutrophil) 


HGNC:548 


ACTN4 


5.4 


transporter 


actinin, alpha 4 


HGNC166 


C1orf21 


5.2 


unknown 


chromosome 1 open reading frame 21 


HGNC15494 


PACSIN2 


5.2 


transporter 


protein kinase C and casein 


HGNC:8571 



kinase substrate in neurons 2 
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Table 3 List of most differentially methylated genes in non-small cell carcinoma (Continued) 



PMM2 5.2 isomerase 

D0T1L 5.1 histone methyltransferase 

WWP2 5 e3 ubiquitin-protein ligase 

GTF3C1 5 transcription regulator 

MDN1 5 nuclear chaperone 

DID01 5 transcrption regulator 

HIST1H3C 5 histone protein 

ANKRD13B 4.9 unknown 

CALCB 4.9 hormone 

PTPRA 4.9 phosphatase 

STAT5A 4.8 transcription regulator 

LIMK1 4.8 cytoskeleton 

SLC23A2 4.7 transporter 

BARX1 4.5 transcription regulator 

NHS 4.5 unknown 

MTAP 4.5 glycosyltransferase 

F0XK1 4.5 transcription regulator 

PCMT1 4.4 O-methyltransferase activity 

SETD1A 4.3 transcription regulator 

CENPP 4.2 centromere protein 

KIAA1217 4.1 unknown 

SUTRK1 4.1 enhances neuronal 
dendrite outgrowth 

RORB 4 transcription regulator 

DLGAP1 4 postsynaptic scaffold in 
neuronal cells 

C3orf21 4 unknown 

ST6GALNAC1 4 protein glycosylation 

ZIC3 3.9 transcription regulator 

LYPLA 1 3.8 hydrolase 

REST 3.8 transcription regulator 

TMEM132D -2 unknown 

MAP1LC3B2 -2.1 microtubule 

CBY3 -2.2 unknown 

RASIP1 -2.3 signal transduction 

PRKG1 -2.3 kinase 

H/D/?72 -2.4 unknown 



phosphomannomutase 2 HGNC91 1 5 

DOTHike, histone H3 methyltransferase HGNC24948 
{Socchormyces. cerevisioe) 

WW domain containing E3 ubiquitin HGNO16804 
protein ligase 2 

generaltranscription factor NIC, polypeptide 1, HGNC4664 
alpha 220 kDa 

MDN1, midasin homolog (yeast) HGNC:18302 

death inducer-obliterator 1 HGNC:2680 

histone cluster 1, H3c HGNC:4768 

ankyrin repeat domain 13B HGNC:26363 

calcitonin-related polypeptide beta HGNC:1438 

protein tyrosine phosphatase, receptor type, A HGNC:9664 

signal transducer and activator of HGNC:1 1366 
transcription 5A 

LIM domain kinase 1 HGNC:6613 

solute carrier family 23 (nucleobase HGNC:10973 
transporters), member 2 

BARX homeobox 1 HGNC:955 

Nance-Horan syndrome (congenital cataracts HGNC:7820 
and dental anomalies) 

methylthioadenosine phosphorylase HGNC:7413 

forkhead box K1 HGNC:23480 

protein-L-isoaspartate (D-aspartate) HGNC:8728 
O-methyltransferase 

SET domain containing 1 A HGNC:29010 

centromere protein P HGNC:32933 

KIAA1217 HGNC:25428 

SLIT and NTRK-like family, member 1 HGNC:20297 

RAR-related orphan receptor B HGNC:10259 

discs, large (Drosophila) HGNC:2905 
homolog-associated protein 1 

chromosome 3 open reading HGNC:26639 
frame 21 

ST6 (alpha-N-acetyl-neuraminyl-2,3-beta- HGNC:23614 
galactosyl-1,3)-N-acetylgalactosaminide 
alpha-2,6-sialyltransferase 1 

Zic family member 3 (odd-paired HGNC:12874 
homolog, Drosophila) 

lysophospholipase I HGNC:6737 

RE1 -silencingtranscription factor HGNC:9966 

transmembrane protein 132D HGNC:2941 1 

microtubule-associated protein 1 light HGNC:34390 
chain 3 beta 2 

chibby homolog 3 {Drosophila) HGNC:33278 

Ras interacting protein 1 HGNC:24716 

protein kinase, cGMP-dependent, HGNC:9414 
type I 

WD repeat domain 72 HGNC:26790 
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Table 3 List of most differentially methylated genes in non-small cell carcinoma (Continued) 



Unique for 
Adenocarcinomas 



KCNQ2 

BAIAP3 
MAP3K10 

MSC 
FAM78B 

H0XA1 
SEPT9 
GAS1 
PTPRN2 

RSP02 

POU3F3 
TRPA1 

SYT6 
SLC6A2 

LHX1 
RAPGEF5 

GDF10 

C3orf45 

SLIT2 

ZNF423 

RHOF 



-2.7 

-2.8 
-30 

4.4 
4.1 

3.6 
3.3 
3.3 
3.1 

2.7 

2.6 
2.5 

2.3 
2.3 

2.1 
2.1 

2.1 

1.6 

1.6 

-1.6 

-1.8 



ion channel 



potassium voltage-gated channel, 
KQT-like subfamily, member 2 



G-protein coupled receptor BAN -associated protein 3 



kinase 

transcription regulator 
unknown 

transcription regulator 
enzyme 

Cell cycle/growth 
phosphatase 

Wnt receptor 
signaling pathway 

transcription regulator 

transporter 

transporter 
transporter 

transcription regulator 

small GTPase mediated 
signal transduction 

growth factor 

unknown 

differentiation/apoptosis 
transcription regulator 
actin filament organization 



mitogen-activated protein kinase 
kinase kinase 10 

musculin 

family with sequence similarity 78, 
member B 

homeobox A1 

septin 9 

growth arrest-specific 1 

protein tyrosine phosphatase, receptor 
type, N polypeptide 2 

R-spondin 2 homolog {Xenopus loevis) 

POU class 3 homeobox 3 

transient receptor potential cation 
channel, subfamily A, member 1 

synaptotagmin VI 

solute carrier family 6 

(neurotransmitter transporter, noradrenalin), 
member 2 

LIM homeobox 1 

Rap guanine nucleotide exchange factor 
(GEF) 5 

growth differentiation factor 10 

chromosome 3 open reading frame 45 

slit homolog 2 (Drosophila) 

zinc finger protein 423 

ras homolog gene family, member F 
(in filopodia) 



HGNC6296 

HGNQ948 
HGNQ6849 

HGNC7321 
HGNQ13495 

HGNQ5099 
HGNQ7323 
HGNC4165 
HGNQ9677 

HGNQ28583 

HGNC9216 
HGNQ497 

HGNC18638 
HGNQ11048 

HGNQ6593 
HGNC16862 

HGNC4215 

HGNQ26781 

HGNQ11086 

HGNC16762 

HGNC15703 



Genes were selected based on the average of log 2 ratios from the samples in each group. Hypermethylated genes common in all tumors and unique for the ADC 
groups were selected based on log 2 ratio > 2.5; hypomethylation included all genes from those groups. For SCC we selected the first 50 genes with the highest 
log 2 ratios in hypermethylation, and log 2 ratio < -2.0 for hypomethylated genes. Gene classes and their correspondent biological functions were retrieved from 
Gene Ontology (http://www.geneontology.org/) and Protein Knowledgebase UniProtKb (http://www.uniprot.org/). 



DNA isolation 

Genomic DNA from patient tissues and cultured cells 
were extracted by overnight treatment incubation with 
lysis buffer and proteinase K, followed by phenol- 
chloroform extraction, ethanol precipitation and RNase 
digestion. 

Artificial demethylation and methylation of genomic DNA 

We used DNA extracted from the MRC-5 cell line and 
commercially available Universal unmethylated DNA 
(UUD; Millipore(Billerica, MA, USA). Fully unmethy- 
lated DNA was obtained by whole-genome amplification 
using the REPLI-g kit (QiagenQiagen - Germantown, 
MD, USA) according to manufacturer protocol, followed 
by phenol-chloroform extraction. Fully methylated DNA 
was prepared by treating MRC-5 DNA and UUD with 



M.SssI enzyme (New England Biolabs - Ipswich, MA, 
USA) according to the manufacturer s protocol. In short, 
10 ug of DNA was incubated for 2 hours at 37°C with 
40 U of M.SssI and 640 uM of S-adenosylmethionine. 
DNA was then treated with phenol-chloroform and 
recovered by ethanol precipitation. 

Methyl-DNA capture 

Enrichment of methylated DNA was carried out using 
the MethylCap technique. DNA samples were sheared 
by sonication to obtain fragments between 200 and 
800 bp. Methylated DNA capture was carried out using 
an adapted protocol from the MethylMiner Methylated 
DNA Enrichment kit (Invitrogen - Carlsbad, CA, USA). 
Two adaptations were made. Firstly, 20 uL of beads and 
14 uL (7 ug) of MBD-biotin protein were used for 5 ug 
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of sheared genomic DNA. Secondly, a single elution 
fraction was obtained by resuspending the beads in 
200 uL of IX binding/ washing buffer containing 2 uL of 
proteinase K (20 mg/ml). The samples were then incu- 
bated for 90 minutes at 57 °C with 800 rpm agitation. 
Remaining steps were performed following the manufac- 
turer s protocol scaled up to 5 ug of DNA. Before the 
samples were used for high-throughput sequencing, 
methyl-enrichment was tested by quantitative (q) PCR 
(data not shown). qPCR was carried out using SYBR r 
green, Phire Hot Start DNA polymerase (Finnzymes - 
Vantaa, Finland)), and 1 uL of the enriched samples. Pri- 
mer sequences are available in Additional file 16. qPCR 
was performed in triplicate and enrichment levels were 
calculated as previously described [50]. 

High-throughput sequencing 

High-throughput sequencing (HTS) was carried out on 
18 samples: seven tumors (three ADC, three SCC and 
one LCC), seven paired lung tissues and four controls 
(two artificially fully methylated DNA and two fully 
unmethylated DNA). For each sample, DNA recovered 
from two independent Methyl Capture experiments was 
sequenced. The Illumina Genome Analyzer IIx platform 
(San Diego, CA, USA was used for both replicates, one 
sample per lane, according to the manufacturer s proto- 
col. In short: after MethylCap, fragments of the methyl- 
enriched fraction were end-repaired and ligated to 
single-read adaptors. Samples were size-selected to an 
average of 320 bp, PCR-amplified and 36 bp sequenced. 

The Illumina Casava pipeline was used for base call- 
ing, alignment and quality control. Results were mapped 
against the reference Human_UCSChgl8_AllChromo- 
somes using eland_extended by Illumina pipeline 1.6.0. 

Bioinformatic analysis 

Detailed information about the evaluation and selec- 
tion of normalization parameters, and identification, 
characterization and selection of DMRs can be found 
in Additional file 3. 

Bisulfite sequencing 

Bisulfite sequencing was carried out on the same 16 
samples used for MethylCap HTS. Detailed information 
is provided in Additional file 3. In short: bisulfite conver- 
sion was carried out using the Epitect kit from Qiagen 
following the manufacturers protocol. Samples were 
amplified using primers specific for the fragments of 
interest (Additional file 16). Amplified fragments were 
loaded on 2% agarose gels and extracted from the gel 
using a NucleoSpin Extract II kit (Machery-Nagel - 
Duren, Germany). Fragments were then ligated to 
pGEM-t easy vector (Promega - Madison, WI, USA) and 
cloned into DH12-p competent cells. Colony PCR was 



performed on 36 colonies and PCR fragments 
sequenced. Analysis of methylated and unmethylated 
CpGs was executed using two online programs: BISMA 
(http://biochem.jacobs-university.de/BDPC/BISMA) and 
QUMA (http://quma.cdb.riken.jp). 

Methylation-specific PCR 

To determining the methylation status of the selected 
regions, we used MSP [33] in 96 samples (48 tumors and 
matching lung tissues). Samples were bisulfite converted 
as described for bisulfite sequencing, and amplified using 
two different set of primers designed for the methylated 
and unmethylated sequences (Additional file 16). The 
fully methylated and fully unmethylated DNA samples 
were used as controls, and a water blank reaction used as 
control for contamination. After amplification, products 
were resolved on 2% agarose gels containing ethidium 
bromide and visualized under UV transillumination. 

Statistical analysis 

Statistical analysis for bisulfite sequencing was provided 
by the BISMA program. In short, Fisher s exact test was 
used for the statistical significance between two bisulfite 
sequence groups at each CpG, and a Mann- Whitney U- 
test for the statistical significance between two groups of 
the entire set of CpG sites. Two-tailed P-value of Fisher s 
exact test was calculated from the 2x2 tables at each 
CpG site. A Mann- Whitney U-test was used for the stat- 
istical significance of the entire set of CpG sites between 
the tumor and lung groups. For determination of signifi- 
cance in the MSP, we used chi-square distribution. Two- 
tailed P-values were determined by the counts of the 
amplified bands for each primer in the group of samples 
lung/ tumor or SCC/ ADC. Groups that showed a P- value 
< 0.05 were considered significantly different. 

Additional files 



Additional file 1: TableSL Data from all samples used in this study. 

Additional file 2: TableS2. High-throughput sequencing reads from all 
samples and replicate experiments. 

Additional file 3: Methods. Detailed description of Bioinformatics 
approach and methods [51-55]. 

Additional file 4: FigureSI. Selection of normalization method. 
Distribution of Pearson's correlation coefficients between replicates for 
different normalization methods. 

Additional file 5: FigureS2. Correlation between technical replicates. 
Scatter plots comparing raw MethylCap-seq signal from two technical 
replicates (R1 and R2) for all 14 samples. Each point represents one 10 bp 
genomic bin. Density of points (Iog10 scale) is shown in different shades 
of blue. Pearson's correlation coefficient is denoted in the top left corner 
of each scatter plot. 

Additional file 6: FigureS3. Correlation between lung and tumor 
samples. Scatter plots comparing raw MethylCap-seq signal from healthy 
and tumor tissue for six pairs of samples. Each point represents one 



Hughes Carvalho et al. Epigenetics & Chromatin 2012, 5:9 
http://www.epigeneticsandchromatin.eom/content/5/1/9 



Page 16 of 18 



10 bp genomic bin. Density of points (log 10 scale) is shown in different 
shades of blue. Pearson's correlation coefficient is denoted in the top left 
corner of each scatter plot. 

Additional file 7: FigureS4. Selection of P-value threshold. Number of 
significant hypo- and hypermethylated regions for different P-value 
thresholds. Y-axis shows number of significant regions on Iog10 scale for 
decreasing P-value threshold, shown on the x-axis. 

Additional file 8: FigureS5. along each chromosome for each paired 
sample. Representation of the log-ratio of relative methylation scores for 
tumors vs. paired lung samples for each sample pair showing DMRs 
along each chromosome. Green colors marks hypomethylation in tumor, 
negative log-ratio <2, and red colors marks hypermethylation in tumor, 
positive log-ration >2. 

Additional file 9: TableS3. Table with all DMRs found in this study. 
14472 DMRs with chromosomal coordinates, log2 ratio, position where 
DMR is located regarding to genes and gene identities for DMRs that are 
in gene areas. 

Additional file 10: FigureS6. MethlCap-seq reads of two DMRs in all 
seven samples and replicate experiments. All 14 samples and two 
replicate experiments for two DMRs that were present in all tumors. (A) 
Sequence reads plotted in the UCSC genome browser spanning 1 0Okbp 
in chromosome 2:63,090,001 to 63,190,000; (B) Sequence reads spanning 
100 kbp in chromosome 14: 60,000,001 to 60,100,000. Reproducibility can 
be observed by the similarity between R1 and R2 of each sample. Red 
bar highlight hypermethylated DMRs in each chromosome found to be 
present in all seven tumor samples. 

Additional file 11: FileSI . Tables with hyper- and hypomethylated 
regions separated in "All tumors", SCC and ADC subtypes. Tables show 
hyper- and hypomethylated regions found present in all tumors or that 
were either unique for SCC or ADC, separated per tumor subtype and 
DMR type. 

Additional file 12: FigureS7. Methylation status of CpGs per paired 
sample in all five fragments. A) Model of analysis. Blue = unmethylated; 
red = methylated. CpGs are horizontally ordered and clones vertically 
ordered. Per paired samples in each fragment, the average of 
methylation of each CpG is calculated and represented from blue to red 
depending on the percentage of clones that were methylated in the 
specific CpG. Top panel represents lung samples, bottom panel: tumor 
samples. B) All paired analyses for CDKN2A, RASSF1, Frag_01, Frag_02 
and EN1. 

Additional file 13: FileS2. Tables with bisulfite sequence analysis of 
tumor versus lung. Counts and statistics per CpG for all clones in 
CDKN2A, RASSF1, Frag_01, Frag_02 and EN1 fragments. 

Additional file 14: FileS3. Tables with bisulfite sequencing statistic 
analysis of SCC versus ADC. Statistics per CpG for all pooled SCC samples 
versus pooled ADC samples. 

Additional file 15: FigureS8. MSP screening. Cropped gel images for all 
48 samples grouped by histological subtype in the five fragments 
analyzed: 17 ADC, 16 SCC, 7 LCC, 8 others (other types and unclassified 
NSCLC) M: primer for methylated CpG; N: normal/healthy; T: tumor; *: 
samples used in MethylCap-seq; TM: Totally methylated control sample; 
TU: Totally unmethylated control sample; 0: blank/water; U: primer for 
umethylated CpG. 

Additional file 16: TableS4. Primers. Sequence and conditions of 
primers used for Bisulfite sequencing and MSP. 
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